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A new regression model search algorithm was developed that may be applied 
to both general multivariate experimental data sets and wind tunnel strain gage 
balance calibration data. The algorithm is a simplified version of a more complex 
algorithm that was originally developed for the NASA Ames Balance Calibra- 
tion Laboratory. The new algorithm performs regression model term reduction 
to prevent overfitting of data. It has the advantage that it needs only about one 
tenth of the original algorithm’s CPU time for the completion of a regression 
model search. In addition, extensive testing showed that the prediction accu- 
racy of math models obtained from the simplified algorithm is similar to the 
prediction accuracy of math models obtained from the original algorithm. The 
simplified algorithm, however, cannot guarantee that search constraints related 
to a set of statistical quality requirements are always satisfied in the optimized 
regression model. Therefore, the simplified algorithm is not intended to re- 
place the original algorithm. Instead, it may be used to generate an alternate 
optimized regression model of experimental data whenever the application of 
the original search algorithm fails or requires too much CPU time. Data from 
a machine calibration of NASA’s MK40 force balance is used to illustrate the 
application of the new search algorithm. 
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AF 

AT 

N2 

Rl 

R2 

R3 

RA 

R5 
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51 

52 


= axial force 

= normal force component at the forward normal force gage of a strain-gage balance 
= normal force component at the aft normal force gage of a strain-gage balance 
= electrical output of the forward normal force gage 
= electrical output of the aft normal force gage 
= electrical output of the forward side force gage 
= electrical output of the aft side force gage 
= electrical output of the rolling moment gage 
= electrical output of the axial force gage 
= rolling moment 

= side force component at the forward side force gage of a strain-gage balance 
= side force component at the aft side force gage of a strain-gage balance 

I. Introduction 


During the past few years a regression model search algorithm was developed for the NASA Ames 
Balance Calibration Laboratory. The search algorithm performs regression model term reduction in or- 
der to identify regression models of multivariate experimental data sets that meet strict statistical quality 
requirements and prevent overfitting of the dependent variable (see Refs. [1] to [5] for a description of differ- 
ent aspects of the original search algorithm) . The original search algorithm was successfully implemented in 
NASA’s BALFIT software package (see Ref. [6]). Currently, the regression model search algorithm is applied 
on a regular basis to wind tunnel strain-gage balance calibration data sets at both NASA Ames Research 
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Center and NASA Langley Research Center. 

Experience showed over the years that the original regression model search algorithm is not always able 
to successfully complete a search. A failure of a regression model search occurs whenever the algorithm 
detects a discontinuity in its search metric, i.e. , the standard deviation of the PRESS residuals of the data. 
Several causes for the failure of a regression model search were identified. They were related (i) to search 
restrictions imposed by the algorithm itself or (ii) to shortcomings and imperfections in experimental data 
sets. Consequently, the search restrictions of the original algorithm were relaxed by giving an analyst a greater 
choice of (i) math term group combinations, (ii) search constraint options, and (iii) search directions. In 
addition, improvements of BALFIT’s analysis report format were made so that shortcomings in experimental 
data sets can more easily be spotted. 

In 2010, after analyzing problems associated with a sparse wind tunnel balance calibration data set, 
it was concluded that the application of the search algorithm is also sometimes unsuccessful because tare 
load corrections have to be applied to some balance data sets before the regression model search can be 
started (see Ref. [7] for a detailed description of the tare load iteration process). Unfortunately, a sufficiently 
accurate tare load correction can often only be obtained if a good estimate of the final optimized regression 
model of the balance calibration data is already known ( “chicken-and-egg” problem) . Therefore, the author 
attempted to simplify several elements of the original regression model search algorithm such that the CPU 
time needed for the search would be significantly reduced while still enforcing the three search constraints 
of the original search algorithm (i.e., variance inflation factor constraint, p-value constraint, hierarchy rule 
constraint). These efforts lead to the development of the simplified regression model search algorithm that 
is discussed in the present paper. 

Important elements of the simplified regression model search algorithm are briefly summarized in the 
next section. Then, data from the machine calibration of a wind tunnel strain-gage balance is used as an 
example in order (i) to determine the CPU time that the simplified algorithm needs for the completion of a 
typical regression model search and (ii) to assess the predictive capability of its optimized regression models. 

II. Simplified Search Algorithm 

The simplified regression model search algorithm was derived from a more complex algorithm that 
was developed for the NASA Ames Balance Calibration Laboratory (see Refs. [1] to [4] for details about the 
original algorithm) . The development of the simplified search algorithm became necessary because some wind 
tunnel strain-gage balance calibration data sets require tare corrections before the original more complex 
search algorithm can be applied. Accurate tare corrections, however, can often only be obtained if a good 
initial “guess” of the final optimized regression model is already available before the beginning of the search. 
This “guess” has to be determined within a fraction of the CPU time that the original algorithm needs for 
the completion of a regression model search. The simplified search algorithm was designed to fulfill all these 
requirements. 

The simplified search algorithm has matured since it was first developed in 2011. It is now included in 
NASA’s BALFIT software package as a new math model type choice that an analyst can select for analysis 
(i.e., the math model type choice Suggested Math Model). The simplified search algorithm uses many of 
the statistical quality metrics and constraints that the original search algorithm applies (see Ref. [5] for 
a general discussion of different metrics that may be used to evaluate a regression model of experimental 
data). However, the quality metrics and constraints are applied differently in order to reduce the total 
number of numerical operations that have to be performed during a typical regression model search. These 
differences are explained in more detail below. In addition, the simplified search algorithm omits the search 
metric minimization that the original algorithm performs (see Ref. [3] for details about the search metric 
minimization used by the original algorithm). 

The flowchart in Fig. 1 summarizes basic elements of the simplified search algorithm. The search starts 
by first selecting a combination of math term groups, i.e., function classes, that help define an upper limit 
for the regression model search. The analyst should select this math term group combination by using some 
subject matter knowledge about the given experimental data set. The chosen combination, e.g., may consist 
of linear terms, cross-product terms, square terms, and absolute value terms. Math term groups suggested 
for the analysis of wind tunnel strain-gage balance calibration data are listed in Ref. [7]. 
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Not every possible term of the chosen math term groups may be supported by the given experimental 
data set. Therefore, the upper bound of the math model, i.e., the so-called Permitted Math Model , needs 
to be determined so that only non-singular solutions of the least squares problem can exist when different 
regression models are tested during the search. This requirement is enforced by using a numerical technique 
called Singular Value Decomposition (SVD). At this point, after completion of the SVD process, the Permitted 
Math Model is known. It defines the upper bound of the search space that the simplified regression model 
search algorithm needs for the identification of the optimized math model. 

In the next step, the variance inflation factor constraint has to be applied in order to prevent near- 
linear dependencies in the optimized regression model. The variance inflation factor constraint requires that 
the largest variance inflation factor (VIF) of a tested math model is less than a threshold. The literature 
recommends a conservative threshold of 10 for the near-linear dependency test (see, e.g., Ref. [8], p. Ill; 
some analysts believe that a threshold of up to 50 is still acceptable). Then, the tested math model will not 
have unwanted near-linear dependencies. The simplified algorithm enforces the VIF constraint iteratively. 
At first, the VIFs of all terms of the Permitted Math Model are computed (Step 1). Then, the term of the 
maximum VIF is identified and removed if its VIF exceeds the chosen threshold (Step 2). Now, the VIFs of 
all terms of the updated math model are computed (Step 3). Finally, Step 2 and Step 3 are repeated until 
the largest VIF of the remaining math model is below the VIF threshold (Step 4). The remaining math 
model is a first estimate of the optimized regression model (Step 5). 

If needed, the first estimate of the optimized regression model, is made “hierarchical” by adding missing 
lower order terms. Enforcing the “hierarchy” rule in the first estimate guarantees that the math model can 
represent possible “hidden” constant shifts in the independent variables correctly. 

Finally, the p- values of the t-statistic of all regression coefficients of the “hierarchical” first estimate 
of the optimized regression model are computed. The calculation of the p- values of the coefficients helps 
identify and remove statistically “insignificant” terms. A term is removed from the first estimate of the 
optimized regression model if its p-value exceeds the literature recommended threshold of 0.001 (threshold 
is taken from Ref. [8], p. 85; some analysts prefer the more conservative threshold of 0.0001). The p-values 
are processed by examining higher order terms before lower order terms. In addition, a term is only removed 
if the remaining math model remains “hierarchical.” The resulting regression model is the final optimized 
regression model that the simplified regression model search algorithm recommends for the regression analysis 
of the given data. 

Extensive studies of the performance of the simplified regression model search algorithm have shown that 
the CPU time needed for a typical search drops by one order of magnitude when compared with the CPU 
time that the original algorithm requires. This advantage of the simplified algorithm becomes important 
whenever large data sets need to be processed. The significant reduction in CPU time became possible 
because the simplified algorithm performs an iterative reduction of math terms by exclusively using the 
VIFs. Afterwards, the removal of “insignificant” terms is performed by analyzing a single math model, i.e., 
the “hierarchical” first estimate of the optimized math model (see also Fig. 1). The original regression model 
search algorithm, on the other hand, minimizes a search metric and enforces constraints at the same time. 
Therefore, it has to compute the search metric, the VIFs, and the p-values of the f-statistic of the regression 
coefficients for each regression model that is tested during a search. This alternate approach requires more 
numerical operations, i.e., CPU time, because the calculation of the p-values is always preceeded by a least 
squares fit of the entire data set. 

The next section of the paper discusses the regression analysis of a wind tunnel strain-gage balance 
calibration data example in order to illustrate different aspects of the application of the simplified search 
algorithm. 


III. Discussion of Example 

Data from the calibration of NASA’s MK40 wind tunnel strain-gage balance was chosen to illustrate 
the application of the simplified regression model search algorithm. The MK40 balance was manufactured 
by the Task Corporation (see Ref. [7] for a discussion of strain-gage balances). It is a six-component force 
balance that measures five forces (AT, N2, 51, S 2, AF) and one moment ( RM ). The balance has a diameter 
of 2.5 inches and a length of 17.31 inches. Table 1 shows the load capacity of each load component. 
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Table 1: Load capacities of the NASA’s 2.5 inch MK40 balance. 



TVl, lbs 

TV2, lbs 

51, lbs 

52, lbs 

RM, in-lbs 

AF, lbs 

CAPACITY 

3500 

3500 

2500 

2500 

8000 

400 


The MK40 balance was calibrated in 2008 in Triumph Aerospace’s Automatic Balance Calibration 
System (ABCS). Triumph’s ABCS is a balance calibration machine that can apply complex combined loadings 
to a balance. These combined loadings are needed for an appropriate characterization of interactions between 
the six balance gages. The final balance calibration data set consisted of the applied loads and the measured 
electrical outputs of the strain-gages of the balance. The supplied calibration data was already corrected for 
the tare loads caused by the metric part of the balance and other calibration fixtures. Therefore, the tare 
load iteration process was omitted during the analysis of the data. Table 2 below summarizes important 
characteristics of the balance and its calibration data set. 

Table 2: Balance and calibration data set characteristics of the MK40 balance. 


BALANCE TYPE 

FORCE BALANCE 

DIAMETER 

2.5 [in] 

CALIBRATION DATE 

JULY 2008 

CALIBRATION METHOD 

MACHINE CALIBRATION 

TOTAL NUMBER OF CALIBRATION POINTS 

1863 

LOAD FORMAT 

TARE CORRECTED LOADS 

GAGE OUTPUT FORMAT 

GAGE OUTPUT DIFFERENCES 


It was decided to analyze the balance calibration data by applying the Iterative Method. This approach 
fits the strain-gage outputs as a function of the applied balance loads. Afterwards, an iteration scheme 
is applied so that loads can be predicted from the measured gage outputs during a wind tunnel test. For 
simplicity, only the fit of the gage outputs as a function of the applied balance loads is discussed in the 
paper. Detailed information about the load iteration scheme may be found in Ref. [7]. 

First, in order to start the search for the optimized regression models of the six gage outputs of the 
balance, it was necessary to define an upper bound, i.e., the Permitted Math Model. The balance is a multi- 
piece balance. Therefore, as the data is to be processed using the Iterative Method, it was decided to include 
absolute value terms in the math term group combination for the regression models of the six gage outputs 
of the balance. The following five groups were chosen for the regression models of the gage outputs: 

Group 1 = F Group 2 = |F| Group 3 = F 2 Group 4 = F ■ |Fj Group 5 = F ■ G 

The chosen math term groups are a subset of groups that are recommended in Ref. [7] for the analysis 
of strain-gage balance calibration data. The symbols F and G represent the load components of the balance 
(i.e., TV 1, TV 2, 51, • • •). The balance calibration data set was obtained in a calibration machine that supports 
all load combinations that can be constructed from the five chosen math term groups. Therefore, after 
including the intercept in the regression models of the gage outputs, the initial upper bound for the search 
has a total of 40 terms for each gage output. The corresponding regression models of the gage outputs R1 
to R6 are shown in Fig. 2. The individual terms of the six regression models are constructed from the given 
balance loads. 

Specific knowledge about characteristics of a Task balance may be used to further reduce the number 
of regression model terms in the initial upper bound. In a recent paper it was shown that not all gages of 
a Task balance have the bi-directional characteristic that justifies the selection of absolute value terms in 
the regression model of calibration data of a Task balance (see Ref. [9]). Figure 3a shows, for example, the 
bi-directionality characteristic of the MK40 balance by using the indicator variable that is defined in Ref. [9] . 
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It can clearly be seen that the rolling moment gage output Rh and the axial force gage output R6 are not 
bi-directional when plotted versus the corresponding primary gage load (see the two plots near the bottom 
of Fig. 3a). The plots of the indicator variable for the outputs R5 and R6 simply do not look like an absolute 
value function as the indicator variable at capacity stays below an empirical threshold. Consequently, there 
is no justification for the use of the terms \AF\ and \RM\ in the regression models of the gage outputs of 
the MK40 balance. A revised 36-term upper bound of the regression models is obtained after removing 
corresponding terms from the initial upper bound (see Fig. 3b). 

An initial analysis of the variance inflation factors of the 36-term upper bound for the regression model 
of the forward normal force gage output R1 is shown in Fig. 3c. A moderate near-linear dependency between 
the absolute value terms of the normal/side force components and corresponding quadratic terms appears 
to exist as terms of those two function classes have variance inflation factors that are between the literature 
recommended threshold of 10 and an intermediate threshold of 50 (see two red boxes in Fig. 3c). The 
connection between the absolute value and quadratic terms comes from the fact that both the absolute value 
function and a parabola are even functions. The variance inflation factor analysis of the MK40 calibration 
data indicates that either the absolute value terms or the quadratic terms of the normal/side force components 
may be omitted in the regression models. A Task balance is, by design, a very linear device. Therefore, it was 
decided to omit the quadratic instead of the absolute value terms of the normal and side force components. 
The corresponding second revision of the upper bound for the regression model search is shown in Fig. 3d. 

Now, the original and the simplified search algorithms can be applied to the MK40 calibration data set 
by using the reduced math model shown in Fig. 3d as the upper bound for the regression models of the gage 
outputs. Figure 4a shows the optimized models of the six gage outputs that were obtained after applying 
the original search algorithm to the calibration data. It took approximately 28 minutes of CPU time to 
complete the search. Figure 4b shows the calibration load residuals of the six load components. The loads 
residuals, i.e., the difference between applied and calculated loads, were obtained after applying the load 
iteration scheme that can be constructed from the fitted coefficients of regression models shown in Fig. 4a. 
(see again Ref. [7] for a description of the load iteration process). 

Figure 5a shows the optimized regression models of the gage outputs that were obtained after applying 
the simplified search algorithm to the calibration data. In this case, it took approximately 3 minutes of 
CPU time to complete the search. Figure 5b shows the calibration load residuals of the six load components. 
Again, the load residuals were obtained after applying the load iteration scheme that can be constructed 
from the coefficients of the regression models shown in Fig. 5a. 

At this point, it is interesting to compare the standard deviation of the load residuals for the five different 
regression model sets that were discussed above. This metric may be used for a preliminary assessment of the 
predictive capability of the five regression models whenever regression model independent confirmation points 
are not available for analysis. Table 3 below lists the corresponding standard deviations as a percentage of 
the load capacity for the six load components. 

Table 3: Standard deviation of the load residuals in percent of the load capacity. 


MATH MODEL 

N1 

N2 

51 

52 

RM 

AF 

Fig. 2 

0.1008 % 

0.0967 % 

0.2115 % 

0.1762 % 

0.1780 % 

0.1354 % 

Fig. 3b 

0.1101 % 

0.1032 % 

0.2402 % 

0.1822 % 

0.2011 % 

0.1709 % 

Fig. 3d 

0.1207 % 

0.1063 % 

0.2456 % 

0.1931 % 

0.2016 % 

0.1760 % 

Fig. 4a 

0.1216 % 

0.1065 % 

0.2473 % 

0.1986 % 

0.2055 % 

0.1808 % 

Fig. 5a 

0.1229 % 

0.1075 % 

0.2481 % 

0.1942 % 

0.2065 % 

0.1782 % 


Overall, standard deviations of the load residuals compare very well for the five regression model sets. 
The comparison also illustrates that differences between the predictive capability of the regression models of 
the original search algorithm (Fig. 4a) and the regression models of the simplified search algorithm (Fig. 5a) 
are very small if the fitted data itself is chosen to assess their predictive capability. The observed differences 
are less than 0.005 % of the load capacity. 
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IV. Summary and Conclusions 


A new simplified regression model search algorithm was described. The new algorithm was derived from 
a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. 
The new simplified search algorithm tries to rapidly identify a regression model for a given experimental 
data set that meets strict statistical quality requirements and simultaneously prevents overfitting of the 
data. The simplified search algorithm has the advantage that it requires only about one tenth of the original 
algorithm’s CPU time for the completion of a regression model search. However, the simplified search 
algorithm cannot guarantee that the final optimized regression model will always meet the chosen statistical 
quality requirements. Therefore, the simplified search algorithm is intended to be used for the generation of an 
alternate optimized regression model whenever the application of the original search algorithm fails or requires 
too much CPU time. The simplified algorithm is also good for an initial assessment of an experimental data 
set as it allows an analyst to catch potential problems sooner. In addition, the simplified algorithm may 
be used to get better initial estimates of tare corrected loads of wind tunnel strain-gage balance calibration 
data. These improved estimates are needed as input whenever the original search algorithm is applied to 
strain-gage balance calibration data that has not yet been corrected for the weight of the metric part of the 
balance, the calibration body, and other calibration hardware components. 

Machine calibration data of NASA’s MK40 six-component force balance was used to illustrate the 
application of the simplified search algorithm to a strain-gage balance calibration data set. First, it was 
explained how a suitable upper bound for the regression model search can be defined for the given data set. 
New information about the bi-directionality characteristics of the individual gages of a Task balance was 
used for this purpose. Therefore, all absolute value terms of the rolling moment and axial force and the 
quadratic terms of the normal and side force components were omitted in the regression models of the gage 
outputs. Then, both the original and the simplified search algorithm were applied to the calibration data 
set using the upper bound to limit the regression model search. This analysis demonstrated that prediction 
accuracy estimates of the optimized regression models show excellent agreement whenever results for the 
optimized models of the original algorithm were compared with corresponding results for the optimized 
models of the simplified algorithm. However, the simplified algorithm has the key advantage that it can 
generate an optimized math model much faster than the original algorithm. It required only about one tenth 
of the original algorithm’s CPU time for the completion of the regression model search. 
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(experimental data set| 

V 


I , SUBJECT MATTER KNOWLEDGE OF ANALYST 

SELECT A MATH TERM GROUP COMBINATION (FUNCTION CLASS SET) 
THAT DEFINES THE UPPER LIMIT OF THE PERMITTED MATH MODEL 



II. REMOVAL OF LINEAR DEPENDENCIES BETWEEN MATH TERMS 

... APPLY SINGULAR VALUE DECOMPOSITION (SVD) 

USE SVD TO DETERMINE THE PERMITTED MATH MODEL, I.E., THE LARGEST MATH 
MODEL OF THE GIVEN DATA THAT LEADS TO A NON-SINGULAR LEAST SQUARES FIT 


- PERMITTED MATH MODEL 


UPPER BOUND OF THE REGRESSION MODEL SEARCH 



III. REMOVAL OF NEAR-LINEAR DEPENDENCIES BETWEEN MATH TERMS 


STEP 1 
STEP 2 
STEP 3 
STEP 4 
STEP 5 


... APPLY VARIANCE INFLATION FACTOR (VIF) CONSTRAINT 

COMPUTE VIFs OF THE PERMITTED MATH MODEL (UPPER BOUND OF SEARCH) 
REMOVE MATH TERM OF THE LARGEST VIF IF ITS VIF EXCEEDS THRESHOLD 
COMPUTE VIFs OF THE UPDATED MATH MODEL AFTER THE TERM IS REMOVED 
REPEAT STEPS 2 & 3 UNTIL THE LARGEST VIF IS BELOW THE THRESHOLD 
USE REMAINING MATH MODEL AS FIRST ESTIMATE OF THE FINAL MATH MODEL 



IV. APPLICATION OF THE “HIERARCHY" RULE 
. . . ADD MISSING LOWER ORDER TERMS TO FIRST ESTIMATE 


V. REMOVAL OF STATISTICALLY "INSIGNIFICANT" MATH TERMS 


. . . APPLY P-VALUE CONSTRAINT 

STEP 1: DETERMINE P-VALUES OF T-STATISTIC OF ALL REGRESSION 
COEFFICIENTS OF THE "HIERARCHICAL" FIRST ESTIMATE 

STEP 2: REMOVE TERMS FROM FIRST ESTIMATE THAT EXCEED P-VALUE THRESHOLD 
(REMOVAL IS PERFORMED FROM HIGHER ORDER TO LOWER ORDER TERMS; 

A TERM IS ONLY REMOVED IF MATH MODEL REMAINS "HIERARCHICAL") 


OPTIMIZED MATH MODEL 

(ALSO CALLED SUGGESTED MATH MODEL) 


Fig. 1 Basic elements of the simplified regression model search algorithm. 

8 


American Institute of Aeronautics and Astronautics 



INTERCEPT 

N1 

N2 

51 

52 
RM 
AF 
In i l 
IN2I 
IS1 1 
IS2I 
IRMI 
lAFl 

N1*N1 
N2*N2 
S1*S1 
S2*S2 
RM*RM 
AF*AF 
N1*IN1 I 
N2*lN2l 
SI *IS1 1 
S2*lS2l 
RMflRMl 
AF*lAFl 
N1 »N2 
N1 *S1 
N1 *S2 
N1*RM 
N1*AF 
N2*S1 
N2*S2 
N2*RM 
N2*AF 
SI *S2 
SI *RM 
SI *AF 
S2*RM 
S2*AF 
RM*AF 
INI *N2l 
lN1*S1 1 

Ini * s2i 

INHRMI 
IN 1 *AFl 
IN2*S1 I 
lN2*S2l 
lN2*RMl 
lN2*AFl 
IS1 *S2l 


Fig. 2 


NUMBER OF TERMS - 40, 40, 40, 40, 40, 40 

HIERARCHICAL: R1, R2, R3, R4, R5, R6 
(HIERARCHY ANALYSIS USES IF*GI = iFklGl, lF*F*Fl= IFUIFI* lift, IFI«IFI=F»F) 


R1 R2 R3 R4 R5 R6 



□ □□□□□ 
□ □□□□□ 
□ □□□□□ 
□ □□□□□ 
□ □□□□□ 
□ mum mum □ 
m m m m m m 
m m m m m m 
m m m m m m 
m m m m m m 



R1 R2 R3 R4 R5 

IS 1 *RMl 

m m m m m 

IS1*AFI 

m m m m m 

IS2*RK/ll 

m m m m m 

IS2*AFl 

m m m m m 

|RM*AFI 

m m m m m 

N1 *IN2I 

m m m m m 

NUlSIl 

m m m m m 

N1*IS2I 

m m m m m 

N 1 * IRM 1 

m m m m m 

N1 *IAFI 

m m m m m 

N2 * IS 1 1 

m m m m m 

N2*IS2I 

m m m m m 

N2*lRMl 

m m m m m 

N2*lAFl 

m m m m m 

S1*IS2I 

m m m m m 

SI* IRM 1 

m m m m m 

SI* lAFl 

m m m m m 

S2*lRMl 

m m m m m 

S2* lAFl 

m m m m m 

RM.IAFI 

m m m m m 

INI l*N2 

m m m m m 

Ini I*si 

m m m m m 

INI l*S2 

m m m m m 

INI URM 

m m m m m 

Ini I*af 

m m m m m 

lN2l*S1 

m m m m m 

lN2l*S2 

m m m m m 

IN2URM 

m m m m m 

IN2I+AF 

m m m m m 

IS1 l*S2 

m m m m m 

IS1 l*RM 

m m m m m 

IS1 l*AF 

m m m m m 

IS2l*RM 

m m m m m 

IS2l*AF 

m m m m m 

IRMUAF 

m m m m m 

N1 *N 1 *N1 

m m m m m 

N2*N2*N2 

m m m m m 

SI *S1 *S1 

m m m m m 

S2*S2*S2 

m m m m m 

RM*RM*RM 

m m m m m 

AF*AF*AF 

m m m m m 

IN 1 *N 1 *N 1 1 

m m m m m 

1 N2*N2*N2 1 

m m m m m 

IS1*S1*S1 1 

m m m m m 

lS2*S2*S2l 

m m m m m 

|RM*RM*RMI 

m m m m m 

lAF*AF*AFl 

m m m m m 


Initial version of the permitted math model of the MK40 calibration data set. 
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*□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 


A(R6,AF), microV/V A(R5,RM), mlcroV/V A(R4,S2), microV/V A(R3,S1), microV/V A(R2,N2), microV/V A(R1 ,N1), microV/V 


THRESHOLO = ±5.9 [microV/V] ; INDICATOR VARIABLE AT CAPACITY = 15.7 [microV/V] > Bl- DIRECTIONAL 


4 ^ ?sn ^T v 'JDO Alpjiuii THRESH0LD ( s 

EE FOOTNOTE) n . L 

THRESHOLD (S 

EE FOOTNOTE) 


-3500. 0. 3500. 

N1, lbs 


THRESHOLD = ±7.2 [microV/V] ; INDICATOR VARIABLE AT CAPACITY = 23.5 [microV/V] > Bl- DIRECTIONAL 



-3500. 0. 3500. 

N2, lbs 


THRESHOLD = ±6.2 [microV/V] ; INDICATOR VARIABLE AT CAPACITY = 28.0 [microV/V] > Bl- DIRECTIONAL 


^^^OeSQ^j-THRrsHOLD (S 

EE FOOTNOI^g^^ 1 ®^^® 

THRESHOLD (S 

EE FOOTNOTE) 


-2500. 0. 2500. 

SI, lbs 


THRESHOLD = ±6.7 [mfcroV/V] ; INDICATOR VARIABLE AT CAPACITY = 32.7 [microV/V] > Bl- DIRECTIONAL 



-2500. 0. 2500. 

S2, lbs 


THRESHOLD = ±6.6 [microV/V] ; INDICATOR VARIABLE AT CAPACITY = -8.44e-01 [microV/V] > NOT Bl- DIRECTIONAL 



-8000. 0. 8000. 

RM, in-lbs 


THRESHOLD = ±7.4 [microV/V] ; INDICATOR VARIABLE AT CAPACITY = -2.5 [microV/V] > NOT BI-DIRECTIONAL 


THRESHOLD (S 

EE FOOTNOTE) 


^ A A 

y -o-o co oo -oo-o — oo <x> -o . h 

w-v o ^oo-oe> -oob — oo <30^0- o -c 

THRESHOLD (S 

EE FOOTNOTE) 


-400. 0. 400. 

AF, lbs 


Fig. 3a Indicator variable for bi-directionality of the MK40 balance. 
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NUMBER OF TERMS = 36, 36, 36, 36, 36, 36 

HIERARCHICAL: R1, R2, R3, R4, R5, R6 



(HIERARCHY 

ANALYSIS 

USES 

lF»Gl = IFI* IGI, 

IF*F*F| — IFI* 

IFUIFI, 

IFUIFI 

— F*F) 



R1 

R2 

R3 

R4 

R5 

R6 


R1 

R2 

R3 

R4 

R5 

INTERCEPT 







IS 1 *RMI 

□ 

□ 

□ 

□ 

□ 

N 1 







IS 1 *AFl 

□ 

□ 

□ 

□ 

□ 

N2 







lS2*RMl 

□ 

□ 

□ 

□ 

□ 

SI 







IS2*AFl 

□ 

□ 

□ 

□ 

□ 

S2 







|RM*AF 1 

□ 

□ 

□ 

□ 

□ 

RM 







N 1 * lN2 1 

□ 

□ 

□ 

□ 

□ 

AF 







N 1 * IS 1 1 

□ 

□ 

□ 

□ 

□ 

INI 1 







N1.IS2I 

□ 

□ 

□ 

□ 

□ 

IN2I 







NHlRMl 

□ 

□ 

□ 

□ 

□ 

IS1 1 







N1 * lAF 1 

□ 

□ 

□ 

□ 

□ 

IS2I 







N2*IS1 1 

□ 

□ 

□ 

□ 

□ 

IRMI 

□ 

□ 

□ 

□ 

□ 

□ 

N2*IS2I 

□ 

□ 

□ 

□ 

□ 

lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

N2*lRMl 

□ 

□ 

□ 

□ 

□ 

N1 *N1 







N2*lAFl 

□ 

□ 

□ 

□ 

□ 

N2*N2 







SI* IS2 1 

□ 

□ 

□ 

□ 

□ 

S1*S1 







SI *IRMI 

□ 

□ 

□ 

□ 

□ 

S2*S2 







SI* lAFl 

□ 

□ 

□ 

□ 

□ 

RM*RM 







S2*lRMl 

□ 

□ 

□ 

□ 

□ 

AF*AF 







S2* lAFl 

□ 

□ 

□ 

□ 

□ 

N1*lN1 1 







RM*lAFI 

□ 

□ 

□ 

□ 

□ 

N2*lN2l 







Ini i*n2 

□ 

□ 

□ 

□ 

□ 

S1*IS1 1 







INI l*S1 

□ 

□ 

□ 

□ 

□ 

S2*lS2l 







INI l*S2 

□ 

□ 

□ 

□ 

□ 

RM*lRMl 

□ 

□ 

□ 

□ 

□ 

□ 

Ini I*rm 

□ 

□ 

□ 

□ 

□ 

AF* lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

Ini i*af 

□ 

□ 

□ 

□ 

□ 

N1 *N2 







lN2 l*S 1 

□ 

□ 

□ 

□ 

□ 

N1 *S1 







IN2 l*S2 

□ 

□ 

□ 

□ 

□ 

N1 *S2 







lN2l*RM 

□ 

□ 

□ 

□ 

□ 

N1 *RM 







lN2l*AF 

□ 

□ 

□ 

□ 

□ 

N1*AF 







IS1 l*S2 

□ 

□ 

□ 

□ 

□ 

N2+S1 







IS1 l*RM 

□ 

□ 

□ 

□ 

□ 

N2*S2 







IS1 l*AF 

□ 

□ 

□ 

□ 

□ 

N2*RM 







IS2l*RM 

□ 

□ 

□ 

□ 

□ 

N2*AF 







IS2UAF 

□ 

□ 

□ 

□ 

□ 

S1*S2 







lRMl*AF 

□ 

□ 

□ 

□ 

□ 

S1*RM 







N1*N1*N1 

□ 

□ 

□ 

□ 

□ 

SI *AF 







N2*N2*N2 

□ 

□ 

□ 

□ 

□ 

S2*RM 







SI *S1 *S1 

□ 

□ 

□ 

□ 

□ 

S2*AF 







S2»S2*S2 

□ 

□ 

□ 

□ 

□ 

RM*AF 







RM*RM*RM 

□ 

□ 

□ 

□ 

□ 

IN 1 *N2l 

□ 

□ 

□ 

□ 

□ 

□ 

AF*AF*AF 

□ 

□ 

□ 

□ 

□ 

IN 1 *S1 1 

□ 

□ 

□ 

□ 

□ 

□ 

IN 1 *N 1 *N 1 1 

□ 

□ 

□ 

□ 

□ 

1 N 1 *S2 1 

□ 

□ 

□ 

□ 

□ 

□ 

lN2*N2*N2l 

□ 

□ 

□ 

□ 

□ 

INI *RMl 

□ 

□ 

□ 

□ 

□ 

□ 

IS1 *S1 *S1 1 

□ 

□ 

□ 

□ 

□ 

IN 1 *AFl 

□ 

□ 

□ 

□ 

□ 

□ 

IS2*S2*S2 1 

□ 

□ 

□ 

□ 

□ 

1 N2*S 1 1 

□ 

□ 

□ 

□ 

□ 

□ 

lRM*RM*RMl 

□ 

□ 

□ 

□ 

□ 

lN2*S2l 

□ 

□ 

□ 

□ 

□ 

□ 

lAF*AF*AFl 

□ 

□ 

□ 

□ 

□ 

1 N2*RM 1 

□ 

□ 

□ 

□ 

□ 

□ 







lN2*AFl 

□ 

□ 

□ 

□ 

□ 

□ 







IS 1 *S2 1 

□ 

□ 

□ 

□ 

□ 

□ 








Fig. 3b First revision of the permitted math model after removal of terms related to \AF\ and \RM\. 
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□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 


PHYSICAL VARIABLES & UNITS - REGRESSION COEFFICIENT ESTIMATES AND STATISTICAL METRICS (R1) 

REGRESSION MODEL HIERARCHY CHARACTERISTICS = HIERARCHICAL 


TERM 

TERM 

COEFFICIENT 

STANDARD 

T— STATISTIC OF 

P -VALUE OF 

VI F 

VI F 

INDEX 

NAME 

VALUE 

ERROR 

COEFFICIENT 

COEFFICIENT 

(PRIMARY) 

(ALTERNATE) 

1 

INTERCEPT -1.3915 

+ 0.0724 

-19.2079 

- 

- 

- 

2 

N1 

+ 0.3435 

+ 0.0001 

+3218.6915 

< 0.0001 

+3.1535 

1+1 1 .1063] 

3 

N2 

-0.0081 

+0.0001 

-73.9053 

< 0.0001 

+ 2.7146 

[+17.9385] 

4 

SI 

+ 0.0002 

+ 0.0001 

+ 1.5301 

+ 0.1262 

+ 2.6630 

[+17.7705] 

5 

S2 

+ 0.0002 

+ 0.0001 

+ 1.6099 

+ 0.1076 

+ 3.1455 

[+16.5158] 

6 

RM 

+ 0.0005 

+ 1.2244e-05 

+40.4820 

< 0.0001 

+ 1.2300 

+ 1.01 16 

7 

AF 

+ 0.0009 

+ 0.0003 

+3.5283 

+ 0.0004 

+ 1.0939 

+ 1 .0052 

8 

INI 1 

-0.0004 

+ 0.0002 

-2.2613 

+0.0239 

+ 16.6365 

[+16.1960] 

9 

IN2I 

+ 0.0021 

+ 0.0001 

+ 15.7623 

< 0.0001 

+ 18.6616 

[+18.6444] 

10 

IS1 1 

-0.0016 

+ 0.0003 

-6.0404 

< 0.0001 

+ 24.21 12 

[+24.2108] 

1 1 

IS2I 

-0.0003 

+ 0.0002 

-1.4333 

+0.1519 

+ 24.0623 

[+24.0623] 

14 

N1*N1 

+ 1 .2580e-06 

+ 7.0086e-08 

+ 17.9486 

< 0.0001 

+ 17.4855 

f+16.6215] 

15 

N2*N2 

— 1.3589e-08 

+ 5.061 7e-08 

-0.2685 

+ 0.7884 

+ 17.4322 

[+17.4844] 

16 

S1*S1 

-9.7985e-08 

+ 1.3672e-07 

-0.7167 

+0.4737 

+ 18.8256 

[+18.8276] 

17 

S2*S2 

+ 5.1 81 0e-08 

+ 1.1 008e-07 

+0.4707 

+ 0.6379 

+ 17.8914 

[+17.8914] 

18 

RM*RM 

-1.31 1 7e— 07 

+ 2.2 1 72e— 09 

-59.1601 

< 0.0001 

+ 1.1603 

+ 1.1606 

19 

AF*AF 

+ 2.6983e-06 

+ 8. 1 051 e-07 

+3.3291 

+0.0009 

+ 1.0432 

+ 1.0432 

20 

N 1 * IN 1 

+ 1 .2456e-06 

+ 5.1 676e-08 

+ 24.1036 

< 0.0001 

+ 1.3995 

[+1 1.4671] 

21 

N2-IN2I 

+ 1 .6883e— 07 

+ 4.51 20e— 08 

+3.7419 

+ 0.0002 

+ 2.6652 

[+18.2346] 

22 

SWISH 

-3.3343e-07 

+ 9.3268e-08 

-3.5750 

+0.0004 

+ 1.9403 

[+1 1.9933] 

23 

S2*IS2I 

-2.4281 e-07 

+ 8.6584e-08 

-2.8043 

+ 0.0051 

+ 2.3194 

[+15.1907] 

26 

N 1 *N2 

-1.4454e-07 

+ 2.2080e-08 

-6.5461 

< 0.0001 

+ 1.5788 

+ 1 .5701 

27 

N 1 *S1 

-1.1 806e— 06 

+ 6. 1 998e-08 

-19.0419 

< 0.0001 

+ 1.7335 

+ 1.5130 

28 

N1»S2 

+ 2.2981 e-08 

+ 5.5697e-08 

+ 0.4126 

+ 0.6799 

+ 1.7532 

+ 1 .5240 

29 

N 1 *RM 

-3.0356e-07 

+ 1 .9 1 96e-08 

-15.8139 

< 0.0001 

+ 1.2554 

+ 1 .0358 

30 

N1*AF 

-1.6866e-07 

+ 2.8800e-07 

-0.5856 

+0.5582 

+ 1.1 131 

+ 1.0233 

31 

N2*S1 

+ 1.1 083e— 07 

+4.9463e-08 

+ 2.2408 

+ 0.0252 

+ 1.51 10 

+ 1.5093 

32 

N2*S2 

- 1 .0591 e-07 

+ 4.4563e-08 

-2.3766 

+ 0.0176 

+ 1.5228 

+ 1 .5204 

33 

N2*RM 

+ 1.3387e— 08 

+ 1.4845e-08 

+0.9018 

+ 0.3673 

+ 1 .0344 

+ 1.0323 

34 

N2*AF 

+ 9.7052e— 08 

+ 2.4748e— 07 

+0.3922 

+ 0.6950 

+ 1.0199 

+ 1 .0227 

35 

S1*S2 

+ 3.2285e-08 

+ 5.0522e-08 

+0.6390 

+ 0.5229 

+ 1.8467 

+ 1 .8467 

36 

SI *RM 

-8.8355e-07 

+ 2.6275e-08 

-33.6266 

< 0.0001 

+ 1.4044 

+ 1 .4048 

37 

SI *AF 

— 3.8047e— 07 

+ 4. 1 734e— 07 

-0.91 17 

+ 0.3621 

+ 1.3182 

+ 1 .3183 

38 

S2*RM 

-8.6731 e-08 

+ 2.5668e-08 

-3.3790 

+ 0.0007 

+ 1.4104 

+ 1 .4108 

39 

S2.AF 

+4.6004e— 09 

+ 3.8899e-07 

+0.01 18 

+0.9906 

+ 1.3182 

+ 1.3183 

40 

RM*AF 

-3.4890e-08 

+ 1 .239 1 e-07 

-0.2816 

+ 0.7783 

+ 1.0000 

+ 1 .0001 


Fig. 3c Variance inflation factors of the math model of gage output R1 for the revised permitted math model. 
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NUMBER OF TERMS = 32, 32, 32, 32, 32, 32 

HIERARCHICAL: R1, R2, R3, R4, R5, R6 



(HIERARCHY 

ANALYSIS 

USES 

IF»GI = lFl*IGI, 

IF+F+F = IFI* 

IFI.IFI, 

IFI.IFI 

= F*F) 




R1 

R2 

R3 

R4 

R5 

R6 


R1 

R2 

R3 

R4 

R5 

R6 

INTERCEPT 







ISI.RMl 

□ 

□ 

□ 

□ 

□ 

□ 

N1 







IS 1 *AFl 

□ 

□ 

□ 

□ 

□ 

□ 

N2 







IS2.RMI 

□ 

□ 

□ 

□ 

□ 

□ 

SI 







IS2*AFl 

□ 

□ 

□ 

□ 

□ 

□ 

S2 







|RM*AFI 

□ 

□ 

□ 

□ 

□ 

□ 

RM 







N1*IN2I 

□ 

□ 

□ 

□ 

□ 

□ 

AF 







NUlSIl 

□ 

□ 

□ 

□ 

□ 

□ 

In i 1 







N 1 *IS2 1 

□ 

□ 

□ 

□ 

□ 

□ 

IN2I 







NI.IRMl 

□ 

□ 

□ 

□ 

□ 

□ 

IS1 1 







Nl.lAFI 

□ 

□ 

□ 

□ 

□ 

□ 

IS2I 







N2*lS1 1 

□ 

□ 

□ 

□ 

□ 

□ 

IRMI 

□ 

□ 

□ 

□ 

□ 

□ 

N2*lS2l 

□ 

□ 

□ 

□ 

□ 

□ 

lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

N2*lRMl 

□ 

□ 

□ 

□ 

□ 

□ 

N 1 *N 1 

□ 

□ 

□ 

□ 

□ 

□ 

N2*lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

N2*N2 

□ 

□ 

□ 

□ 

□ 

□ 

SUIS2I 

□ 

□ 

□ 

□ 

□ 

□ 

SI *S1 

□ 

□ 

□ 

□ 

□ 

□ 

Sl.lRMl 

□ 

□ 

□ 

□ 

□ 

□ 

S2*S2 

□ 

□ 

□ 

□ 

□ 

□ 

SI *IAFI 

□ 

□ 

□ 

□ 

□ 

□ 

RM.RM 







S2*lRMl 

□ 

□ 

□ 

□ 

□ 

□ 

AF*AF 







S2*lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

N 1 * IN 1 1 







RM.IAFI 

□ 

□ 

□ 

□ 

□ 

□ 

N2*lN2l 







Ini I*n2 

□ 

□ 

□ 

□ 

□ 

□ 

S1*IS1 1 







INI l*S1 

□ 

□ 

□ 

□ 

□ 

□ 

S2*lS2l 







INI l*S2 

□ 

□ 

□ 

□ 

□ 

□ 

RM.lRMl 

□ 

□ 

□ 

□ 

□ 

□ 

INI l*RM 

□ 

□ 

□ 

□ 

□ 

□ 

AF.lAFl 

□ 

□ 

□ 

□ 

□ 

□ 

INI l+AF 

□ 

□ 

□ 

□ 

□ 

□ 

N1 *N2 







IN2US1 

□ 

□ 

□ 

□ 

□ 

□ 

N1»S1 







lN2l*S2 

□ 

□ 

□ 

□ 

□ 

□ 

N1*S2 







IN2I+RM 

□ 

□ 

□ 

□ 

□ 

□ 

N1*RM 







lN2l*AF 

□ 

□ 

□ 

□ 

□ 

□ 

N1.AF 







IS1 l*S2 

□ 

□ 

□ 

□ 

□ 

□ 

N2*S1 







IS1 l*RM 

□ 

□ 

□ 

□ 

□ 

□ 

N2*S2 







IS1 l*AF 

□ 

□ 

□ 

□ 

□ 

□ 

N2*RM 







lS2l*RM 

□ 

□ 

□ 

□ 

□ 

□ 

N2*AF 







IS2UAF 

□ 

□ 

□ 

□ 

□ 

□ 

S1*S2 







IRMI.AF 

□ 

□ 

□ 

□ 

□ 

□ 

SI *RM 







N1*N1*N1 

□ 

□ 

□ 

□ 

□ 

□ 

SI *AF 







N2*N2*N2 

□ 

□ 

□ 

□ 

□ 

□ 

S2*RM 







SI *S1 *S1 

□ 

□ 

□ 

□ 

□ 

□ 

S2*AF 







S2*S2»S2 

□ 

□ 

□ 

□ 

□ 

□ 

RM*AF 







RM.RM.RM 

□ 

□ 

□ 

□ 

□ 

□ 

Ini *N2i 

□ 

□ 

□ 

□ 

□ 

□ 

AF.AF.AF 

□ 

□ 

□ 

□ 

□ 

□ 

IN1.S1 1 

□ 

□ 

□ 

□ 

□ 

□ 

IN1.N1*N1 1 

□ 

□ 

□ 

□ 

□ 

□ 

Ini *S2l 

□ 

□ 

□ 

□ 

□ 

□ 

|N2*N2*N2 

□ 

□ 

□ 

□ 

□ 

□ 

In 1 *RM 1 

□ 

□ 

□ 

□ 

□ 

□ 

IS1*S1*S1 1 

□ 

□ 

□ 

□ 

□ 

□ 

IN 1 *AFl 

□ 

□ 

□ 

□ 

□ 

□ 

IS2*S2*S2I 

□ 

□ 

□ 

□ 

□ 

□ 

IN2.S1 1 

□ 

□ 

□ 

□ 

□ 

□ 

IRM.RM.RMI 

□ 

□ 

□ 

□ 

□ 

□ 

IN2+S2I 

□ 

□ 

□ 

□ 

□ 

□ 

lAF+AF.AFl 

□ 

□ 

□ 

□ 

□ 

□ 

lN2*RMl 

□ 

□ 

□ 

□ 

□ 

□ 








lN2*AFl 

□ 

□ 

□ 

□ 

□ 

□ 








IS1*S2l 

□ 

□ 

□ 

□ 

□ 

□ 









Fig. 3d Second revision of the permitted math model after removal of square terms of N 1, N 2, 51, and 52. 
(Second revision is used as the upper bound for both the original and simplified search algorithm.) 
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NUMBER OF TERMS = 19, 22, 18, 17, 10, 24 

HIERARCHICAL: R1, R2, R3, R4, R5, R6 
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Fig. 4a Optimized math model obtained after application of the original search algorithm. 
(Regression model search was completed after ss 28 minutes of CPU time.) 
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Fig. 4b Load residuals for optimized math model of the original search algorithm. 
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Fig. 5a Optimized math model obtained after application of the simplified search algorithm. 
(Regression model search was completed after ss 3 minutes of CPU time.) 
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Fig. 5b Load residuals for optimized math model of the simplified search algorithm. 
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